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specific research probleiL. San^ of the data analysis siaagtioas for 
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C^S3inC&L 12B THE SSUJ*1GIKS EST^^SSS SETS 

• CI TAZIA5LES 



I. IsrroiuctiDn 

Socl£)Iogls^f3 are becoilag iscxeasingly sophisricaced la their ^ 

f ' 

use of Tn-ilr/^^^^^rA st3X 1st leal nodelSj^eod a ntnEbfer of er cell est ^ 

source^ aov available in the literature. (See Tan de Geer, 1971; 

3Ialc^, 1971; &>ldberger aad i>uiican, 1973- ) «e vlsb to direct 

sociologists' atteatloa to a multivariate statistical technique, ybose 

usefulness as a data analysis tool has a greet deal of appeal vhea 

interest centers aroind the 5^1at distriiutiOQ of rvo or nore sets of 

variables- Ibe techalqoe, cai^oalcal correla ti o n ^ vas developed hf 

« 

Sotelllag (1935, 1936) more than three decades ago, and is used rather 
ertensively In bionietrics and psychoinetr ics > In sociological 
literatiire, Kiat2>:y and Hodge (1971) used the technigue to analyze 
in tergene rational t^ccupational nobility. Van dfe (^^^(1971) used the- 
technique to estiiuate the parcmeters of wnobservabl^' ,:variebXe nto de ls, 
and Hanser and Goldberger (1972) noted the similarities heween 
canonical correlation and confirmatory {actor analysis in the esti- 
mation of unobservable variable rodels. Eos-ever, these applications 
do npt begin to eachaust the potential tisefulness cationical cor- 

y 

relattoa analysis. , Some of rhe data analysis sanations for which 
canonical correlation is appropriate are discussed in tais paper. 

. Consider a situation in vhich a researcher is a position - 
to s^iarate the valriables of interest to hla into sets, and*^ ia ia - - 
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a posfricm zo poszuiaue *'£l0»s*' of -^«i^r>«*7r^ ^ccag the varisile sets 
based npan, infxrrmzticzx obtained frois a tiieoreticai a^dal* ISie 
researd:ex*s primary objective is in detenriaisig to -wiat exDent rrK? 
^ ^f&at point the distributions of these variable 'sets intersect* 
, i^iltivariate statistical techaiques are designed to provide ansvers to 
tiiesc ^/ypes of qaestioas, thoii^ perh^s from different data asaiytic 
points of viev, Koreover, the researcher is Interested in answering 
the following questions as a means of evaluating the ^Isusibtlity of 
Bonje specific i^otheses inpliefi in his theoretical model: <1) Khat 
is the total relationship betveen the dependent the independent 
variable sets? (2) In instances in iciich the independent set consists 
of, several theoretically distinct suBsets, one jcay ask vhat is the 
r e l at ive contribution of each subset to the total ^^rr^rnt of variation 
eag>lained in the d^>endcnt setl (3) Ifhich variables in the d^endent 
and ind^endent set(s) respectively contributed isost to the top^' 
aiaoLznt of variation sStared between the sets? Those fainiliar vith imi- 
variate correlatjLon and regression analysis vHi incnailately irecoghixe 
that questions one and two are practically identical to those that one 
would ask if t|ie relations between individual variables are pursued. 
Bideedi it has been shown that certain aspects of Mnm^ir^^^ correlation 
a nalj fjsis are sizxp3^ extensions of imivariate correlation theory 
(Eoz^pan, 1965, 19^^ . . ' v 

liils paper yr»sents a pedagogically oriented zerlest of such 
Lcal literature that has been presented on c^onical cor- 
•felatloD (see Bartlett, 1967; Anxierson. 1957; terrlso o. i967/ 

Cool^ znd Lohncs, 1971; Van de Geer, 1971). Ve think tijat" the 
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specific problems jsxpljored here, by vay of sn crsnple, ^iould stlno- 
laxe a grearer Interest ia tie general tssefnlness of this caltivariete 
statistical technique • VitM^ this poatext, the current discussion 
• foccses cn t»o specific objectives • 

First , as is knom by Tss^t practicing 'rethoiolpgists^ it is 
suggested that caripnical correlation .analysis offers' a parsinonious 
vay to re&cc the conplexities Involved ia relatii^ several dependent 
variables to several , independent variables , parti calarly vaea it is 
appropriate to conc^toal 1 ze dependent and independent variables 
respectively as indicators of theoretical constr^ts. Eos^cver, it 
shoald be noted that the approach enployed here has n eit he r the 
statistical precision nor the theoretical parsiaDay of slanltaaeous 
statistical [iDodels, particularly vhen the research problem calls for 
their ase, and their asstsnptioas can be net (see Hauser and Goldberger, 
19^72; Burt, 1973; btmcan and Goldberger, 1973). On the other hand, it 
can be a^gxied diet deficiencies in the d a t a and/or In the theoretical 
Bodel shoul<| not Seter researchers fron exazdning, , at least in an 
exploratory manner, the frtdtfolness of a theoretical ^proach to a 
subject that is defined as problematic. In this respect, canonical 
correction, as it is agplied^in this paper, can provide the 

researcher v£t3i an alternative whose requirements are. less stringent 

> • • • " ^ * 

t>>?n those characteris^c of ^inzultaneous estiratioa procedures • 

^ The second ^j^ctive invplves an attespt to resolve, sone of 

' ' . ■ ^ ■ 

the probless freqisently ehcountered in.tr3rlJig to interpret canonical 

T 

soltttlons. It is probably the case that oae of the sain resscma Mhj 
caaoaical corfelatico is so infrequently used by 'researchers has to do 
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witi the ■^fficalty associated vltri iaterpretiixg c^nmirel roots asd 
vect«3. It is suspected that thi s prcfeleit of incerpretaxion arises 
partly frcnr a lack of gppreclarloa of exactly viiat is being do:^ vnea 
tie relgtionship between seps of variables are stijected to a cas-oaical 
correlation analysis, Ve ta!fce the position that the interpretation 
problen can be" ^z^rtically f-TiTrirprefi if it can be shcswn t5iat canonical 
corral fftica ii ^ parsinionioas vay of decoii5>Dsiiig a set of nnltiple 
correlation coefficients . Thxzs, it viU be shown that both the canoni- 
cal coefficients and vectors cad be given itrterpretatioas that are as 
neaniagfol as confuting multiple and maltiple-partial correlation 
coefficients. 

II. Ar^lications ^ 

In this section the particular approach talc en toward canonical 
correiatioa is applied to a specific research problem addressed by 
Wilson's (1973) study of the detenniaants of housing status, Tne 
interest is in analyzing the determinants of the qixality of hotting 
occupied by primary f acilies vho ovned their dwelling unit in 'i960. ' 
Ihe dependent set Y, housing quality, is conposed of oeasures of 
whether the dwelling unit is in standard condition (Y-), the age of 
the unit (T^), and a oeasure of the quality attributes of the unit 
(Y^). The independent set 2 consists of measures of carttal . ' 
duration^ (W.) , <he total tmrrber of children presenti in the ^family 

age of the youngest child (W^), education (X^^), occupational . 
prestige (X^), and total faaily Incone (X^)* The first three 1 
^variables are defined as laeasures of faaily status ( V/ )> latter 



8 



2 

three are defii^ as m&asurte of eocioeconamijc status ). The 
observed correLatix>a siaaiig these measures i3 exhibited in Table 1. 

figure 1 suaaarizes cbe erpected direction of the relationships 
eisoxii the variable sets. It should be noted that the ix>del as dia- 
grammed {>astulates relationships ^w?ng theoretical constructs repre- 
sented by the va^Rj^e sets .Ccf. Suiliva^,- 1972). l^ie reason'for this 
relates to the fact that evaluating ti4 full ic^lication of the model 
may irequire the use of more than sm^ canonical solution. Thus, for 
exaisple, the corr^tion bety^ Y ^aad its Indicators may require 
tvo dif f erent ^ets of es£imates- in order to determine the total effects 

of W : aad X- 

-"^y event, the^del hjrpothesizes that the effect of family 
fi>fixus on hoiising qtiality is ^3q;>ected:^o be negative, largely because 
of the infijxence, exerted by family size and age of the 'young^t child. 
Lai^ie families are least likely to be in a position to spend^^.^eat 
deal on housing co^mm5>tion. Socioeconomic status .should have a nega- 
tive effect on fa^i? *^tattjs, because e±z^ of , family is inversely 
related to all thrp^m:^astires of socioeconopic status. Finally, socio- 
economic stattjs should' ba\^ a positive effect on housing quality,' 
because the qu^^^ of the ^^^n^^ing environment^shotild reflect social 
status cohsidelratij^s. ^ 

With respect ti^he research questions posed earlier, a full 




^^valuation ofr'^he Implications of the model diagrammed in Figtire 1 
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require (D a measure, analogous to a' multiple R , which can te used 
t» sunoarize the overall predictive ability of th^'model; (2) measures, 
analogous to multiple-partial correlation coefficients, which can be 
used to determine the relative contribution made by the independent 
subsets W and X to the total variation Explained in -the depend^t 

» 

'set Y; '(3) measures that can be used to interpret the direction 

(positive versus negative) of the relationships between the variable 

A 

sets; and (4) measures, when used in conjunction with those in (3) i 
that can aid in determining which measured indicator variable (s) of 
the respective sets played significant role(s) in determining the 
overall relationships .between the variable sets. 

Matrix notation is employed throu^out this exposition in order 
to 'clarify and" enhance the derivations of specific measures. For 
illustrative purposes, let Y represent a P^^ x N matrix (P^^ = 3), 
y .a .P^ X N matrix (P^ = 3), X a ?^ x N matrix (P^ = 3) , and 
7 a P, X N matrix (f ^ = ? + 6). ^ Note further that N 

*T 2 Z W X , 

refers to sample size, and P. refers to the number of variables In 

J 

e^ch set, respectively. Assuming all variables are -escpressed In 
stan^strd form, ^the relationship between sets Y Z can be 

63q)ressed in £erms of the following equations:^ 



vnere 



I 



U V sre X S matrtces of c^nmic^l vaiiares , abd 

* 

The xovs of {j and Y linear cosbinations of the variables in 

sets Y -2:^ Z respectively,^ Ihe .relationship between the 
linear coEblnation in {j and y can be esTxessed In -tenns o# a 
r^onlcal correlation coefficient, Ihere ^ir^^^tch canonical 
Coefficients possible. The problem addressed by canonical correlation 

reduces to finding: (1) the matrices {{ an^ B ^ canonical> veig^ts, 

' • / ' ' 

and <2) a ^ x 1 colmn vector C, vlth el^eiaents^^ <i « 1, 

• 

which are correlations tetveea linear combinatiohis of the variables 
• •• 

in set Y vitfa those in. set Z- ^ order to find t2te vector and 



-^he niatrices A ,^nd B> form the prodiicts 



' 3 



"y' 






Y'z" 




7'Y 






Z7_ 



mltiplying by 1/K, 



Y'Y ■ Y'Z 






Z'Y Z'Z 







1/H 



I 

> "aSd^-soJve the ffcllowing set of homogeneous eqxjations:^ 



(2) 
(3) 



.4^ 



vaere 



> and are characteristic roots (J - 1, . . . ,K) , 

I is the identity astrix, \ 

A and 3 are ? x 1 colrxn vectors of r^mlcr!] veigbts 
(J » 1,..,,K) (rnese vectors are the transpose 
' . of tie row- vectors ia f{' and ^'.} 

and^ where the foil owing constraints are in^sed: ^ 

^ (1) is of fijll rank, e.g., (H'yj)"^ ^ 

« 

(2) Y a ?, a?*Ji ffiatrtx. 

^ I. 

2 is a (?^ ?^ r K » ?2 X K matrix. 



(3) Tne first t £ itiii (?3^»?2^ characteristic roots of ' 

(R yy)"^ R y2 (R ^)"'^ R diStlUCt. 



. (If ?2 > then all of the roots extracted > 

<R 22)"^^ K Z? YY^""^ ^ YZ^ distinct. 

The TtTTTTber of nondistinct roots vill be equal to ?^ - ? .> 

(*) A'.Ryy A =1 aadB' R B-1 * • ^ 



in order tiiat the canonical variates in [J and V' 
in Standard fora* . 



14 



f 



11 




■fippiying eqaatioQs (2) zad (3) to the observed cosrelatloa matrix 
displzyed da Table 1, 



,178 
.135 
.0D3 



C = 



.422 
.3S8 
.955 



A = 



.365 -.103 .930 
.137 .946 -.103 
.921 -.315 --353j g 















.022 


• .912 


.527 


-.352 


.018 


-1^312 


, .053 


-.020 


.485 


.448 


.050 


-.047 


.388 


-.094 


-.339 


.526 


-.070 


. .125 



It can be observed that the ch arac terla tic roots of equatioxis 
(2) and (3) are identical and are the squared caaoglcqfc, correlation 

•4 

coefficients. Since all of the canonical coefficients are significant 

beyond the (.01) level .of rejection tzsing Dili's lambda (Barletlt, IS^i, 
5 

1974) , ve are confronted \d.tii tiie problem of interpreting the s^5>- 
stantive significance of at least the firsts tvo/gpnrmfcca? coefficients. 



A. ?fulrij>3Le Coefficients 

The l:e7 .to. interpreting canonical coefficients is recognition 
pf the fact that they are defined as the corr^^ions betveen Unear 
coabinatioas of the original variables in sets Y and ^ and n3Dt the 
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■ X ■ • . ■ - 

oofrelatlazis betveen tixe original varisbies thcmselTes. ISezs, earn 
squared c e rtmlc sl coef ficlest £s a reasui ' ^ of a cerr-aln sscrsit ci the 
rotal varlatlwi shared betsreea tm>. sets of variables . A oteasore cf 
the total fenocst of Tariatioa eiaared becv^a tm sets of variables can 
also be cfctalned, vnicn Is analogoos to but noT idenricai vlii the 
sqz:ared prodtict moiae^tt^correlatlca coefficient, or vitfa the squared 
noltiple correlation coefficient (vhen the in^epertim set is conposed 
of tvo or nore ind^>endeat variable subsets). This coefficient has 
beea termed the Squared Vector Multiple Correlation Coefficient (here- 
afzex referred to as STMC) (Srlkantaa, 1970). lie coefficient STMC is 
defined (Sozeboom, 1965, 19^^ SrU^ntan, 1970) as • 4; 



smz - « 1 . (1 - cf) , <A) 



xjiiere jj_ indicates sequential gnltlp licet loa. 



VGA - Jla - A 



is the Vector Coefficient of Alienation, or the vector correlation 
between y end the residual of Y - Y, where Y ^ least squares 
estiisates of the variables in Y* 35ms, the correlation between Y 
and Y Y ^ also a canonical relationship that confonas to equations 
(2) and (3). ^erefore, 

. SVMC - 1 - 7CA . - • 



13 

J 

» # • 

If the ;researciier is mrcrfcsred ia 'festimsxiag the total ar c igi t 
of yartatlos sharef'betv^ca t»o sets of varici3Las, STiE is the , 
25>prcprlat€ neasixre* W^fi respect to the first re&eara qoesticiv 
posed earlier » 

SVMC - 1 - .709 - .291 * . 



viiicb suggests tiiat -2:9 perceat of the variatioa of tie vaxlsxes ia 
set U can be explained bj the variates in set V- Sote particular Ij 
that the interpretation is applied to the seriates and not the original 
set of variables. 

Srikairtaa (1970) presents tvo other multiple canonical coeffi- 
cients that may be appropriate for sore research problems. Eovevcr^ 
we favor STMC because it is a direct extension of the squared product 
SKxaent correlation coefficient. T&Mnajor disadvantage of all of these 
neascres is that their interpretations are not necessarily equivalent 
to the proportion of Vaxlatica in the variables of set Y that can be 
eaqjlained by the variables in set 2* Measures that perudt this type 
of interpretation are available, and are our next topic 'of discussion^ 
(See Stewairt and Love, 1965; Hilier and ?arr, 1971; ^pert and 
Peterson, 1972; Wood, 1972.) 

lit was noted previously that the nurber of nonzero and 

2 « . . . ' ^ 

posifive c^ values derived from equation (2) is detenrlned by the 

/ 

rank of the variance-covariance matrix (the correlation ma^trix in the 
exazple) associated vith the scaliest variable set. 7or exanp le, %€ 
the Y c^rtx contains three variables and the .7 matrix six, the" , 

I . .7 



» • 



■ ' 17 



1^ 



Trgy^TTTTTr of txc^Tirero 2zxd pz^stzlTe valo&s is iiznited to riree 
(altnoi^ one, and peAsps all three, csy slot i?B scaxlsdcalij signi- 
fl«at): Casseqnearly , it is sterisrically possiile to es^lain all . 
the varietlcQ ia the variEbles in set Y coly 50 per<;eat 6i the 

TarlstioQ in tiie varlsiles of set Z Alpert and Peterson, 



1972) • One aspect of tne interp^fe^atidb prdbiem elliided to earlier 
viti respect to canoaical correlation is the symnetric chara^er of 
the equated canonical correlation coefficients and Its mltlples* 
Thus, onr Immediate objective is to develop an asymmetric ineasare of 
e35)lained raxlation, vhixh is analogous to the squared xcaltiple cot- 
relation coefficient. It vill be recalled that the squared TOltiple 

correlation coefficient is a measure of tae amptnt of vafiaticn in a 

I 

given variable that can be es^lained by a Hnp^>r con^iaarion of pre- 
dictliig variables • Stewart and Love (1968), and Hiller and ?arr G.971) 
have developed a laeasure for caTronical analysis that is analc^ous 
to the sqjiared multiple correlatlxra coefficient and can be inter* 
preted as the proportion of the variation in set Y ,yhlch can be 
explained by set 2» vill denote these i^asures as ^ when 
the emphasis is on explaining the *variat±on in set Y» * d^z»y ' 
vhen the ecphasis is on explaining the variation In set 2* ^ 
general, - ' ' ' 

d y^2 d z*y . 

It is this asymetric quality of this sieasure (hereaftet referred to 

as total redundancy) that ?ake.s it a aore useful iseasure than cither 

— «^ 

2 

or its 22ultiples. As a oeas«rc of association, it the 
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^olloving desirable quaiLlzies: (1)- vill he sero if srd oaly 

. iflJ i=in ^aTZ) li.vIII achieve a valoc of 1 if oslj if 

Y2 • * 

rhe vartarion la eecJi of the 7^ varisiles can be coir5>letel7 explsfaed 

by 'Zhe rarlatlpss la . Z» » R = !• 

For Illustrative pOrposes, ve siiall 'focus irsal a ly oa the fieri- 
•pstloa of *2' , siace ' 3 csa be c&tafcae^ la's stnilsr maaaer- 

It caa be shcsn chat ,E is aa arirhnetic sverege of tiie eqxjared 

Q y z 

Tuaitiple correlacLoa coeff icieats obtaiaed froa predictiag each 
variable froir all of the variables in J- ?irBt, ve define the 
?^ z EL matrices R^^y ^JsT 



Ryu = Ryy^ 



and 



YU 



.387 
.347 
.919 



.150 
.121 
.845 



.044 
.938 
.155 

.002 
.879 
.024 



.921 
-i.02^ 
-."362 



.848 
.000 
.131 



19 
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SSjsdlzrljt 



.'472 



.750 

'.201 -.2il 
.237 
.348 

7X7 ^003 -.017 



.093 
-.115. 
-.005 

.724 .331 
.752 -.035 



ZV 



338 
13 
.000 
.524 
.565 
.514 



.985 
.225 
.040 
.109 
.001 
.000 



.000 

.563" 

.058 

.056 

.121 

.000 



Toe r\ . eleineats in {f^, snd jf respecti-s^ely , 

7i,uj «i9vj YU Zv 

are defined as the proportioa of the -vartatiaa la the i variable in 
Y or Z that can be explained. by the ' j^^ canonical variate in 0 
or V» resgpctively. Postmaltlplying f^y^ aai R^^y ^ ^ 
K X 1 colcan vector Cthe vector of s«Joared Cffnonicff!? correlation 
coefficients) , we iiave ^ 



Qy = 



.150 .002 
.121 .879. 
,845 -.024 



.030 
:i40 
.154 



.848 
.000 
.131 





.178 




-135 




.003 


• 





and 



zv 



C = Qz 
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^^^^ 

Z7 



!> 



.008 
.013 
.000 
.526 
.565 
A.5U 

.135 
.050 
.007 
.108 
.101 
.093 



.935 


.000 


.225 


.563 


.040 


.058 


.109 


.056 


.001 


.121 


.000 


;ooo 



.178 
.135 
.003 



yields a ? X i'"^l«Ba vector of squared TOxltlple correlation coef fi- 
cieats-. Postinulciplyiiig further by a ? x 1 mity vector T 

fields , ■ - - 



2 -^2 



i-1 ^i-^ 



^asBttch as ^^ii'8is5>ly the sum of the-^?^.^ values predicting 
each variable In Y- given the variables in Z, it is possible that 
the fotner can achieve a value greater than one. The -naxintm value of 
r2 ■ is equal to Tr(R or the puober of vauriables in 
Ideally, one would want to eaploy a iseasure to .otplt^JLa variation that 
confor=8 to the Holts of (0.1), vfetch aakes ^.^ less attractive 
as a aeasure of association. The asyixietric aeasurc ^R^,^ corrects 
for' this undesirable quality by dividing .by the nuri>er of , 

varl4tblcs ia--¥. Total redundancy can thus be defined as . 



IS 



j-1 



*1 k 2 



5-1 y ^ 



0 



i-1 



yi-Z 



- (.030 + .140 + .15ti}/3 

- .108, 

vhere the r ^ ^'s axe elemeats of R. and, the* a are t5i£ 
7i,uJ J J 

X 1 colmm vectors of R \ » 

1 , YU 

The «ize of the noltiple redimdancy measure Indicates that 

socioeconoislc status and family status cosibined explain 11 percent of 

-the variation in the measures of hotzsing quality. However, Inasinoch as 

the theoretical Eiodel postulates asyrcietric relationships aiaong the 

variable sets, this reasure is of little tSj&e in this respect. The 

neasures aost relevant for t^p task ate the i!rultiple--partial laeasures 

of redundancy, vhich are deveJbped below, 
♦ 

B, tailtiple-Partlal Coefficients ^ , , ' 

In instances in vhich the Ind^endcnt variable set can be 
decosposec into subsets, we- can define a set of ' jiailtiple-p^tlal 
coefficients- These coefficients idan be used to,^tenalne the rela- 
Ive^ contrlbutiOT Isade by each subset of 2 ^ total aaount of 



Z2 



vartatioa explained 'is set Y- la^ the example » . the indepeadeat set 2 

is campose^d of Tvo sets of iadepeadeat variables, i.e.^ the siK>set M 

of family staycs variables, and the subset X socioecoaomic septus 

variables • Ihe first st^ in the confutation of the inuiti^le-partial 

coefficients involves comuting the redund^cv iDeasures ,R and 

d yv 

vhich indicate the ainount of variation in set Y be 
explained hj set^^Ji^ and X separately. Once this is acconplished , 
^y^^ ^ decomposed into the following coirj^onents - * 



(1) - * 



" (.108 - .070)/Cl - .070) 



»- .041, 

imich indicates that the relative contribution of fanily status to the 
total aaoxmt of variation explained in housing <}uality *ifi (.041) or 
38 percent {(,041/. 108) 100] 



- (.108 --.047) /(i - .047) ' I 

' . ' -4* 

- .064, • . - "X 

vfaich indicates that the relative contribution orVocioeconoait; -status 
to the total aaount of variation explained in housing quality is (.064) 
or 59 percent {{.064/. 108) 100]. " . 

(3) Finally, ve shoxald point out^ that ^cospdnents (1) and (2) define 
the ''unique^' contrifautira of sets V| and X» is statistically ~\ 
possible that soze portion of the total variation explained in set. 
by sets W and X night represent the coabined effect of these 
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Iridepeadeat subsets. Tais caa occair vhen the independent sobseis are 
hi^ily inrerrelaced and therefore ciay exert connaOT iafluence*, (See 
I)uacan, 1970; Coleman, 1970, for exaisples.) , The third co3i5>onent can be 
derived as a residual, 

d yz d y-z(v) d yv(z) ^ 
- .108 - (.064 .041) 

^ 4 

« .003, ^"^ 

which in our case is very saall.. The reader shoiild note, hovever, that 
while the application of the above decomposition to situations in which 
there- are toDre than two independent subsets might appear straightforward, 
it Eay be siore difficixlt to interpret cosrponefit (3), becatise this 
component would then be eqxial to the sua of ail possible nonredundant ^ 
cccnbinations of covariations existing betveen the subsets. 

The multiple-partial pi^sures of redundancy provide the answer 
to the second question posed, earlier. Clearly, the relationships hetxeen 
socioecohoBic and faaily status with housing jjuiity, chough saall, are 
nonzero. But the theoretical nodel postulates not only that, the 
observed relationships are nonzero but also that they should be in a 
specific direction. With the isultiple-partials, we can only say that 
the relationships are. of a certain size; we cannot say. whether they 
inply positive or negative relationships. This applies as well to the 
other canonical coefficients discussed earlier and largely results fron 
the vay in which these coefficients are coaputed. The directlxyn of the 
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varishle-set relationships and the issue of v^ch specific variables 

■9 

vithln the dependent and. indepeddeat sets, respectively^ are responsible 
for zhe total relations between variable sets can be detenzlned by 
further inanipulatlng the ^^^^j azid r^^ elesients of and . 

R^, respectively* 

C* Canonical Variate-Observed Variable Selations 

If ve used all of the infonnation .obtained from- the matrices 
^YU ^Zy vector C, a Ciore precise description of the 

relationships between socioeconomic and family status and housing 
quality would be as depicted in Figure 2, viiere the relations between 
the vari^tes (a^) are det'eticined by applying constraints (4) and (5), 
and the relations between variates and indicators are defined as 

^ij V.uj ^ij - ^i.vj- 

• ' . . 

- A ^l8eful indicator of between-set relationships, is the- si^n and 
size of the ^yj^^j and r^^^^j values. If we wanted to' relate a 
variable in' set Y with a variable in set Z> the sign of the r^^ 
values are ii?5>ortant, because they indicate the direction of the ' 
' association between 'the two variables as measured by the product oooent 
correlation coefficient. Indeed, the product 'moment correlation 
coefficient has simply been subjected to decoi:5)osition and can be 
estimated from the following equation: 

""71,21 ' ^yl,uj ''2l,vj. (5) 

Applying this equation to the relations between and (Z^ « W^), 
.we have 



25 



I 



22 





r ^ , - ,347 {.422) .093 + .-938 (.3&3) .992 + <*.020) (.0553 .014 

' /*013"i- .342 -r .050 ^ * 

* . K -a* * 

• + ■ .356. 

Hare geaeraily, the matrix R can be reprofcced by a57?lying tJse 
foilowiag equation: 

vnere End |^ are x 2^ ^ ?^ x 2 matrices respectively, 

YU ZV ^ *- 

and S is a diagonal matxix vith the canc m lr.^1 coirrelatijon in the 
K X 1 colunn vector C as eleiadlts. 

. . Terns, tie signs of 5^ x.. . and r^. , valtaes can be used 
to deternjLne the general .direction of the relationships between the 
variable sets. Oa the other hand, the sizes of these values are p6or 
indicators of betveen^-set relationships by t^^etnselves because ^ey only 
iikdicate the contribution cade by the i^ variable in s^ts ^ or J 
to the iotal amount of variation extracted by the canonical variat^. 

from all the variables^ each set, respectively. If they are veighted 
by the sqtxared canonical correlatAm coefficients^ they provide some 
indication of the aiaount of variation explained In^the i , variable 
of one set given all the variables in the other set via the 
canonical relationship.^ Ihe svst pf these values for each variable 
across the j canonical relationship is equal to the squared mltiple 
corr.aiation coefficient for that variable given the variables in the 
-other set- The^ reader will recall that the ? coluan Vector Q . 

squared railtiple cooel^iti&^^eff icients was ^©i^ned as v 
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So? ve xriei to decocyosc e^ci of the sqaar^ed xEslti^le «?rrelarip3 coef£i- 
deats ist^c ea additive set of values that csa be associated vitb ead) 
CfTtml variete extracted from set Y • Snis , if ve mdcipij eaci 
r . , val^e in fr bv the value it ±s assori ated viti, have 

2 2 ^ ^ 

vhich is a teasure of the acount of variation ezplaiaed la the i*^ 
variable of sec Y by the variables in set Z J^^ canonical 

variate. (la the language of factor analysis^ 1^^ is sinply the 
sqxiare of the loading of thfe i variable oa the j factor,) It 
sboold be obvious that, bj definition. 



and 



1.1 yi-z , 



The 1 . . (or 1 . .) values,- -then, not oaly provide tis with 
yx^^j , 21, V3 v^/*"^ * 

a aeans of deteminiag which variable iar sach/set siade the l^argest contri- 



bution to the . canonical relatlodship, b^ it also indicates vhat 
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propozzlm of the total variatioa ea^^laiaed la a giv^ variable 

tb 

can be associated X-ith thi&- j casoaical relatioosbip. I5kxs, t2ie ^ 
total redasdancy measure ^ f^^n be csed to estimate the tot^ 
aroint of variaticQ ia Y that can be explained by ^/^^ its decOTy 
position into an additive set of valaes pemlt the deremlaation of 
vhich variable ijx \ is actually being explain^. 

Table 2 reports the ecpirical estlzsates derived 
the laeasures ve have discussed in this chaprer. Ihe last colnm In the 
table* reports the mltiple and multiple-partial redmdancy measares, 
vhose relative sizes su^^t that both socioeconomic and fanily staros 
are related to the quality of the housing environment inhabited by 
ovner households. As vas noted earlier, our objectives are to deter- 
mine not only whether socioeconomic status and fanily status are related 
to housing quality, but ve vant to <Setennine vhether the hypothesized 
directions of these relationships are confimed by the data* ^fe lioted 
that the overall direction of the relationships betveen sets can only 
be detenained by analyzing the signs and relative sizes of zije> relation- 
ships i^etveen the observed measures and the c^onlcaL variates, ?or 
each canonical solution extracted from equation (2) , Table 2 reports 
r... and 1.- values for each of the measuced variables. As a further 
aid to interpretation, tisk third coluzn under each canonical solution 

repotts^the 1 values as proportion^ of" the tptal 'variation earolained 

2 

in eac& of the variables <as represented by multiple S coefficients)-^ 

• * * " * ♦ 

• * * 

From Table 2 It can be ^observed that socioeconomic status appeasrs 
tc be related to hoxising quality because of the positive relationships 
between measures of the f orr^er and condition and quality attribtites of 
dwellings. This observation is st^ported bj the valves r^orted under 
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zhe firsc Ccrrtofrlcal ^solaxicn izi ybic£i the signs of the coef flciears are 

2 * 

all positive czd the i^^/S^^ ^ values are at least (.SS). 2xe first 

caMoical sointira captoxes practically all of the covariatioa r h^ r 
exists betxreea ^Dcioecon^^ai^ statns and housiag quality* Ihus, vith 
respect to this relsrlmshi^ sxir expectations are corffirsed. 

^ Ihe 'relationship between fandly sracsss and housing quality 
energes in second canonical Siplcjtijon. Again, iising lie I ^ 
valzjes as the basis for evaluation, it is evident that age of dwelling 
is being explained by marital duration and nunber of children. Clearly ^ 
the basis of the relationships that hotising quality bare vith socio- ^ 
exxTDondjc stattxs and fanlly status are not the sas£./ HDreover,,it should 
be equally dear that oza: expectations in regards to the underlying 
reasons fox the relationship between housing quality and fanily status 
are not confimed* Ke postulated a ne^tiv^ relationship ber^ruse it vas 
suggested that large fandlies are 'wre likely to live in poorer quality 
housing* % find, on the other hand> that the relationship is positive 
and it is usarital duration, not ^ge and nunbcr of, children, r^ ftt is the - 
basis for tiis relationship. Thes^ results are consistent vith tjae*^ • 
argument that faisilies age'i?ith their units.' 

It vas predicted that socioeconomic status vould be negatively 

4 

♦ 

related to fajnily status because of the inverse relationship between 
size of fasdJLy and the thre^^jaeasures of sodoeconoisic status. These 
relationships are reported in Table 3. Socioeconosic status explained 
an average x>f 9 percent of the variation in farily status. Moreover, 
it is clearly evident that the positive relationship between rarital 
duration and education Is ressponslble for the overall relationship 
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betveea these variab'le sets. Hie relet iocship betpeea edacatitn J> 7r3 

« 

tms&er of chlldrea, tfcou^ sisall^ is positive, wiile income r rt^ occo- 
patioaal prestige seem to bear no releticsashi? to this 'Variable. 
Finally, family iacone appears to be positively related to age of 
yomgest caild vith respect to both the first and second f^PTKmicril 
soitttion, a relationship i&ich onr theoretical model did apt predict-. 

V Discuss iCQ 

One of the main reasons vhy these parricuiar sets of variables 
vere chosen in order to demonstrate the utility of canonical correlation 
analysis relates vo the stractzire of tiie observed correlation matrix. 
First, the vithin-set and becween-set co^elat ions are rather srnall^ 
vhidi is due in part to the particul^ jcaaaer in xshich these variables 
(particularly the measures of housing quality) vere operationaiized via 
the census. Even given these low values and the exploratory nature of , 
the theoretical 'iBodel under revies.% it voi^d still be of sos^ interest 
to detemdne tn^ reasonableness of the riodel in terns of whether it 
warrants further Investigation. The conceptualization of the oli^rved 
variables as indicators of specific theoretical constructs vould ^pear 
to this writer to be a reasonable approach to take toward these ^ta. 
Tnis is the" pirimary reason why the nodel as depicted in Figure 1 is 



defined in terns of the relationships between sets of variolas ^ 
althougji we wex;e also interested in the issue of which variables within 
each set were responsible for the betweea^et relationships . Moreover, 
it should be apparent that a tanonical solution is derived ralnly frca 
the between-set correlation natrix R_ and the cotrrelatian between 
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variates snd indicator variablers are largely a fiactioa of the srrticture 
of this inatrix. •Ihcs, tbe sctecpt here vas not to fiad ^^f^ optljsal ^r- 
rel^ion betveea a tiieorettcal oDnstrtLct end its indicatory, btxt ratier 
to siaply sunnnarize the relatixnEhips betveea vari^le sets witiiout 
in^ljing that aa optiraal set of relations vere obtained* Adsdttedly^ 
this goal is less anb it ions asd less parsimoaious rl^?n vhat vould be 
obtained using a simaltaneous estimatloa procedure. 

Eovever, rievred froa another angle, the technique enplojed 
presents a clear picture of the conplexity of the relationships betveen 
the dependent set and each of the ind^endent sets. Ve vere able to 
detect the fact that neasurM of soc loet^ogoaii c states and fandly status 
are differentially related to neasrires of housing quality. Khat this 
ittfi-^s essentially is that if housing quality vere related separately 
to socioeconomic and fandly status, different variables la. the former 
.set voaid have eiaerged^aS oSdg largely responsible for the total rela- 
tionship between the variable sets. In other vords, the correlation^ 
between indicator variables and canonical varlates would vary dreading 
on the nature of the variables in each set- This is an undesirable 
state of affairs, became unless we can assuae that t}/e effects of 
indicator variables within each Independent set are, the sasie with res- 
pect to each indicator in the xfependent set, zbere is no single ^1>est" 
est*iz;ate of the unobserved-^unobserved correlations and the 
unobserved- indicator correlations. For ermple, if the first canoaix^al 
solution is xj\Ver\ as the best overall estlinate of the relationship of 
housing quality with socioeconomic status and faisAj status, th'en we 
would have virtually ellzdnated the relation^^hlp between housing quality 
and faaily status, since that relationship caerg^ In the second canonical 
solution, not Che first. 

3'4 
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lha pzxAlejs, of dlf fer^tlal assoctatlon berveezi d^ecdeat and 
independent sets is likely to Incresse in conplexity ss the ncnber of 
independent sets are increased vhicii, in some instsnces, necessitates 
the application of less restrictire and less precise statistical 
models in order to eval^iate the impli^cations of the researcher's 
theoretical rodel. Thus, onr inaln argmaent is sinply that the reasnres 
ve have proposed here can be used to partially ovexcooe this problem 
*vSen nore sophisticated and restrictive statistical Bodels should not 
he applied. ^ 
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Jhe measure ^'quality attributes of the dwelling uait** is define^ as tiat 
proportion of valae of groperty vhidi remains after eUndnatiag from it 
the effects "of its measured detennliian'ts. (Sfee Wilson, 1973* ) 
Age of dwelling unit, age of youngest child, total nunib^ of persons is 
the f airily, educatioa and occapatioaal prestige (Duncan scai^) are 
express^ in logarithss. Ihe generallred least #quares estiisate of unltrs 
in standard conditioa is en^ployed. Tnis estima^^e tak^ the fonn: 



{ 1/? (1 . ?) 3^ 



where 



Y is the observed (0, 1) value of the variable, 
i 

? is the OLS estiniate' of the prohahility of l^^tng in standard 
^ * , * ' 

trxit. 

Tne data for this analysis are derived from the 1960 Censtss 1/1,000 

♦ -I 

Public Dse Sprrple tapes. 

The interested reader can find an extensive discussion of derlyati&ns in 
the technical literature cited earlier. ' 

If one solves eqtiation (2), then the vector Bj.<^ obtained as followsjj, 

B, = — 7 OR B =— 22 25L«J^ . 

J k J U 

llilk*s ^smhA^i cpafonas approxinately to the chi square distribution with 
(P^)(?2) degrees of freedoru • , 

Jhe latter Is true if and only if the csatrix is of full rank, otJ^rwise 
zore variation can be C3^1ained. This is the prixsary reasonr^iy it is 
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fregoentiv suggested that the niaEDer of variables in the 4!^>eadeat set 
should be eqiial tc or less* y^fp the sumber of varf^les' ia'rhe ^ixid^j^pJideat 



137 



35 



Alpert, Mark I., and Peterson, Eobert A* 

"On the larerpretatioa of Cat^onlcal AnilTsis," Journal of 

Marketlag ?esearch 9 CKay 1972): |87-192. 
Aadersoa, !• W. 

IntrodTOtloa to Hultlvarlate Statistical Aaalysla * Sev York: 
John laiey and Sous, 1958, 
Bartlert, S. 

^Icte Statistical Sigaficaace of Canonical Correlations/* 
Biaaetrika 32 (January 1941): 29-38, 

*^Haltlyariate' Analysis." . J6urnal oj^ the Royal Statistical 
^^^ ^So^Iety St^lement 9: (1947>: 176-190, • 
Blalock, H. M. , ed; 

Causal Models in the Social Sciences > Chicago : Aldlae-Atherton , 

1971. 
Burt, Ronald S. 

"Confirmatory Factor-Analytic Structures and The Theory Construction 
Process/* Sociological Kethods and Research 2 {Kovesier 1973): 
131-19JD, ^ 
Coleman,- J» S* 

"Reply to Cain and' Watts." Aaerlcan Sociological Revley 35 
CApril 1970): 242-248. 
Cooley,_Willian A., and Lohnes, Paul R. 

Kultivartate Proce^res for the Behavioral Sciences . New York: 
John%ley and Sons, 1971. 



ERIC 



38 



- ■ 36 

Dincaa, Otis Dudley. 

**?artials. Partitions, aad Paths." la "Socioloe^^^l ^orV^dology 

edited by Edgar 7. Boxgatta and George Bohmstedt* San 

Francisco: Jossey-Bags., 1970. 
Goldberger, Arthxir S., ^2id Dimcan, Otis D. , eds. x ^ 

Stnittui^al Squatioa Models in the Social Sciences * pesy York: 

Secinar Press, Inc., 1973. 
Eaoser, Robert K. , and Goldberger, Arthur S. 

"The Treacaent of Uziobservable Variables in Path Analysis." 

In Sociological Ifetbodology, edited by Herbert E. Co^tner. 

San Francisco: Jossey-3ass, 1971. 
Eotelling, Harold.^ 

';ihe «o8t Predictable Criterionl" Journal of gg &catlonal 

I 

t 

Psychology 26 (February 1935) : 139-242. 
Klatzfcy, Sheila, and Hodge, Bobert ff. 

"A Canonical Correlation Analysis of Occupational Kobility." 
'Journal -of the Aiaerican Statistical Association 66 (March 1971) : 

Itonrlson, Donald F. 

Multivariate Statistical Methods > New York: KcGraw-Hlli, 1967. 

Sozebooxa, William. • *. . * • 

"Linear Corr^tions Between Sets of Variables." Psychoiaetrika 
30 (March 1965): 57-71. ' I 

"The Theory of Abstract Partial s: Aaj Introduction.*' 
Fsychonetrilca 33 (June 196S): 133-167. 



37 



Srikaaraa, K- S- 

'•Canonical Asfeociatlra Secweea, Kcmiaal Keastiremeats/* Jooraal ^ 
of the Ainericaa Statistical Assoc iatioa 65 (Karcb 1970): 2S4-292» 

Stevart, Douglas, and Love, William. 

General Canonical Correlation Index." Psychological 3alletln 
. 70 (Septenber 1968): 160-163. 
Still ivan^ John L. 

"Multiple Indicators and Omplex Casual Kodels.** In Casual *Hodels 
in the Social'' Sciences , edited by H. K. Blalock. Kev York: 
Aldine^Atherton, 1971. 

* 

Van de Geer, John ?. 

IptrodiK:tlon^o Jjl^tivariate ^ijialysis for the Social Sciences . 
Y'"^ Sac'Trancisco: .V. .1. rrecoan aad ^mi^dy, dS71. . ^ , 

VJilsoa, Franklin D. • ' \ ^' 

"Dimensions of Housing , Status^"'' ^I^ublished Doctoral Dissertation, 

Washington State University/ Pullman Washington, 1973. 
Wood, Donald A. • . • , ^ * 

"Toward the Interpretation* of , Canonical Dinensions." Multivariate 

Behavioral Research 7 (October 1972): 477-482. 



V 



40 



